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(54) Title: SPEAKER VERIFICATION SYSTEM 
(57) Abstract 

The invention relates to a method at a speaker verification system/speaker identification system which makes possible for the system 
operator to find out the identity of a customer by means of analysis of a recording of the customer's speech. In speaker verification 
systems/speaker identification systems the amount of voice data which are collected from the customer is a decisive limit for the usefulness 
of the system. The more parameters a model comprises, the better it can be adapted to given training data at the same time as there is needed 
more and more training data to in a reliable way be able to estimate all existing parameters. The invention utilizes pre-trained reference 
models in a speaker model so that one can benefit from collected data in addition to the information which the customer himself/herself 
speaks in his/her registration call. The central concept of the invention is to organize said reference models in a set of pro-models and 
anti-models, where the pro-models model a quality which the customer has, and the anti-models model a quality which the customer does 
not have. 
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TITLE OF THE INVENTION: SPEAKER VERIFICATION SYSTEM 

FIELD OF THE INVENTION 

The present invention relates to a method at a speaker 
5 verification system/speaker identification system which 
makes possible for the system operator to find out the 
identity of a customer by means of analysis of a recording 
of the customer's voice data. 

10 PRIOR ART 

In speaker verification systems, systems for automatic 
verification of the identity of a speaker, the amount of 
voice data which has to be collected from the customer is a 
decisive limit to the use. The more parameters a model has, 

15 the better it can be adapted to given training data, but at 
the same time there is needed more and more training data 
(time which the customer has to spend in an initial phase ) 
to in a reliable way estimate all parameters. 

A problem in connection with speaker verification 

20 consequently is to create a sufficiently good model of a 
customer's voice, on the basis of as small an amount of 
voice data as possible, in order to find out the identity 
of the customer by means of analysis of the recording of 
the customer's voice data. By customer here is meant a user 

25 of some service with need of check of authorization. 

One consequently might say that the above mentioned 
problem is a type of optimization problem where it is a 
matter of utilizing smallest possible amount of voice data 
to in a reliable way be able to appoint the identity of the 

30 speaker. 

The aim with the present invention consequently is to 
solve the above mentioned problem. 

35 

CONFIRMATION 
COPY 
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SUMMARY OF THE INVENTION 

The above mentioned aim is solved by means of a method 
at a speaker verification system/speaker identification 
system which is presented in the characterizing part of the 
5 patent claim 1, 

The present invention has the advantage, in comparison 
with previous speaker verification systems/speaker 
identification systems, that, in spite of utilizing a 
minimal amount of voice data, the identity of the speaker 
10 can quickly be found out. 

Further characteristics are given in the subclaims* 

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION 
In technical connections one usually makes a 
is difference between speaker identification and speaker 

verification. 

With speaker identification then is meant a 

verification system where a speaker identifies 

himself /herself by speaking just any sentences, at which 
20 the identification system analyses the voice and identifies 

characteristics of the voice by which the speaker 

identification is perf ormed. 

With speaker verification is meant a verification 

system where a speaker's identity is verified by the 
25 speaker speaking (or entering by keypad) a specific in 

advance decided information, at which the verification 

system directly confirms the authenticity of the 

information (and identity) or rejects it (example of such 

a system is a cash dispenser; in Sweden "Bankomat") . 
30 The two systems basically relate to the same thing, 

which is to distinguish and distinctly find out a speaker's 

identity. 

It consequently should be realized that in the present 
invention we equate the concepts "speaker verification" and 
35 "speaker identification". 
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The invention is inteded to be used in all speaker 
verification systems, especially in such which are used in 
a service where one has access to information about the 
users . 

5 The voice recording can be made either directly at the 

equipment, where the verification is performed, or be 
transferred via different media. Medium can be telephone or 
other telecommunication media, inclusive computers . 
In speaker verification systems today often a 

io "likelihood normalization" is utilized, i.e. a type of 
probability normalization. In principle these speaker 
verification systems function in the following way. 

Let us suppose that a customer, for instance Leif, has 
the intention to verify his identity by means of a speaker 

15 verification system to get access to a certain service. In 
this case is assumed that Leif ' s voice profile is stored 
since before in a database belonging to the speaker 
verification system. 

When Leif speaks a voice message via for instance a 

20 telephone in the speaker verification unit, the voice 

profile is stored and analysed. The speaker verification 
unit finds out that the probability is very high that Leif 
is a man over 40 years old. In addition the speaker 
verification unit finds out that Leif is speaking staccato. 

25 The speaker verification unit now searches in the hierarchy 
of different groups in the database and finds a group 
comprising men over 4 0 who are speaking staccato. 

This group is rather limited (for instance 40 persons) 
and the speaker verification unit compares Leif's stored 

30 voice profile with all voice profiles which are stored in 
this special group. With very great probability the speaker 
verification unit consequently finds Leif's voice profile 
in this group, whereupon identification is made. 

The above mentioned method consequently is based on 

35 that the speaker verification unit on basis of probability 
finds out to which group in a database a person, for 
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instance Leif, belongs. After that, stored voice profile is 
compared with all voice profiles in said group. 

This method of course is considerably more efficient 
than if the speaker verification unit indiscriminately 
5 should compare stored voice profile with all included voice 
profiles in the database. This would take an enormous 
amount of time if the database for instance contained some 
thousand voice profiles. 

The present invention is a further development and 

10 improvement of the above mentioned method and is based on 
that one by using pre-trained reference models as 
components in a speaker model can benefit from collected 
data in addition to these which a customer himself /herself 
speaks in his/her call to be recorded, and by that reduce 

15 the lenght of this call. The central idea of the invention 
is to organize these reference models in a set of pro- 
models and anti-models. The idea is that the pro-models 
shall model a quality which the customer has {for instance 
woman, beween 20 and 25 years) and the anti-model a quality 

20 which the customer does not have (for instance man, not 
between 20 and 25 years) . Purely mathematically produced 
reference models which normally do not correspond to a 
distinguishable quality of the customer also can be used. 
Complementary sets of pro- and anti-models should be 

25 used. If the reference models correspond to concrete 

qualities of the customer, in addition a priori knowledge 
can be used to control the selection of reference models. 
This knowledge in different ways can be made accessible in 
the system, 

30 A more detailed description will be given a bit 

further on in the description. 

In speaker verification connections one uses, as has 

been mentioned above, "likelihood normalization" where one 

standardizes the contribution from a customer specific 
35 model with one or more "world models" or "impostor models", 

which with above used terminology are anti-models. The 
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customer specific model corresponds to function f c in 
equation (1) below. The novelty in (1) therefore is to 
combine the ant i -mode Is with ( complementary) pro -mode Is . 
Whether it is a novelty to generally make use of a priori 

5 knowledge to select reference models is doubtful, but the 
arrangement with pro- and anti-models goes well with the 
use of a priori knowledge. The theory of selecting an 
optimal set of reference models and a belonging projection 
is certainly known in the mathematics /signal theory, and 

10 consequently is no novelty in itself, but the application 
of this thinking in connection with speaker verification is 
according to our opinion quite a novelty . 

In the following the invention will be described in 
more detail . 

15 Regard a speaker model as consisting of a} reference 

models, and b) a projection on these reference models* The 
projection can for instance be a weighted sum of 
contributions from the reference models (a linear 
combination) . In addition a speaker model of course can 

20 include model elements which are built exclusively from 

voice material from the customer himself /herself and which 
do not use reference models, but the following description 
focuses on the part where some form of reference model is 
included. 

25 The reference models are normally trained from speech 

in a database which is collected in the design phase of the 
system, i.e. before a customer registers himself /herself in 
the system, A reference model either can model I ) some 
predetermined entity { for instance "female speaker" , 

30 ''speaker under 16 years" or "call from GSM-telephone" ) , or 
II) something determined by mathematical optimization and 
which by that cannot very well be connected to a specific a 
priori knowledge as in case I. 

Arrange the reference models in one set with pro- 

35 models and one set with anti-models and calculate the "hit 
probability", P, of the total model, so that contributions 
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from the pro-models increase P, and contributions from the 
anti-models reduce P. This procedure can mathematically be 
expressed according to equation (1), where f p and f a are 
functions of contributions of the pro- respective anti- 
5 models, and together constitute the projection part of the 
total model. f c is a function of submodels trained on data 
from the customer himself /herself . One also can make use of 
a logarithmic variant of (1), 

io f p (pi,p 2 , ,pn) 

P = f c (Ci, C 2 , . - . , C H ) 

f a (a i# a 2 , , a Q ) 



If reference models according to case I are used, one 

15 can utilize a priori knowledge about the customer to build 
the customer's speaker model, for instance knowledge about 
the speaker's sex, by selecting right reference models. 

Example: For a male speaker one can select a pro-model 
for "male speaker" and an anti-model for "female speaker". 

20 In this way one can in a simple way make benefit from 

a priori knowledge when one builds ones speaker model. This 
knowledge will be a contribution to collected voice data 
and one can make a better functioning model with less 
collected voice data from respective customer. One suitably 

25 selects complementary reference models as pro- and anti- 
models as in the example above. In this way one ought to 
get a more reliable, balanced model and one can get a 
discrimination effect by the two complementary models 
"pulling" in different directions. 

30 The above mentioned "a priori knowledge" may be 

introduced into the system at different phases and in 
different ways : 

a) At the (first) call to be recorded and by that in 
connection with building of the first speaker model* If the 

35 customer already is registered in the service and therefore 
identifies himself /herself to the system at the 
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registration to the speaker verification system, one can 
utilize customer information which is already stored in a 
database, for instance sex and age. If the customer is not 
pre-registered in the service, he/she may present his/her 

5 civic registration number at the registration call, and 
then one can get information about the sex by looking at 
the civic registration number. One also explicitly can ask 
about sex and age during the call. 

b) After the first call to be recorded. At that, one 

10 may already have taken the model into operation, and it 
will be a matter of rebuilding the model with new 
information * The information can for instance come from a 
filled up and sent in form which the customer signs to be 
allowed to go on with the service after an initial phase. 

15 Adaption of speaker model and especially change of topology 
during its life cycle is treated in Telia's patent 
application No 9602622-4 relating to "Procedure and 
arrangement for adaption at for instance speaker 
verification systems" (Case 520) , 

20 Instead of using pure a priori knowledge, one may 

select one's reference models by calculating an optimal set 
of reference models and belonging projection on these. 

The above is only to be regarded as advantageous 
embodiments of the invention, and the extent of protection 

25 of the invention is only defined by what is indicated in 
the following patent claims . 
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PATENT CLAIMS 

1. Method at a speaker verification system/speaker 
identification system which makes possible for a system 
operator to find out a customer's identity by means of 

5 analysis of a recording of the customer's voice data, 
characterized in that reference models are 
organized in a set of pro-models and a set of anti-models 
which constitute components in a speaker model, which 
speaker model on probability basis is utilized by said 

10 speaker verification system/speaker identification system 
to process and benefit from said collected voice data in 
addition to the data which the customer himself /herself 
speaks at his/her call to be recorded, whereby the 
customer's identity can be found out with minimal recorded 

\5 voice data, 

2, Method according to patent claim 1, 
characterized in that said pro-models model 
qualities which the customer has, and said anti-models 
model qualities which the customer does not have, 

20 3. Method according to patent claim 1, 

characterized in that said reference models are 
trained to recognize just any voice information, which 
voice information has been stored in a database at the 
design phase of said speaker verification system/speaker 

25 identification system before a customer has had time to 
register himself /herself in said system. 

4. Method according to any of the previous patent 
claims, characterized in that said reference 
model is arranged in said set of pro-models and said set of 

30 anti-models, at which the "hit probability", P, of said 
reference model is calculated so that contributions from 
the pro-models increase P, and contributions from the anti- 
models reduce P. 

5- Method according to patent claim 4, 

35 characterized in that the "hit probability" of 
said total reference model, i.e. total probability that a 
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certain customer belongs to a certain category, is given by 
the formula: 

f P (Pi,P 2 , f Pm) 

5 P - f c (Ci,C 2 , . . .,C N ) 

f a (ai, a 2 , t 9q) 

where P corresponds to the "hit probability" of the 
reference model, f p and f a are functions of contributions of 
io the pro- respective anti-models which together constitute 
the projection part of the total model, and f c is a function 
of submodels trained on voice data from the customer 
himself /herself . 
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